A Causal Classification of Orthography Errors in Web Texts

نویسنده

Mirko Tavosanis

چکیده

Errors, even at the spelling level, can provide useful insight into the nature of a written text. This paper presents a classification of spelling errors in Web texts based on their causes (misspellings, typos and intentional deviations), linking them to the attitudes of their authors and the circumstances of their writing. Examples are drawn from blog and forum entries in English and Italian.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

Automatic Identification of Learners' Language Background Based on Their Writing in Czech

The goal of this study is to investigate whether learners’ written data in highly inflectional Czech can suggest a consistent set of clues for automatic identification of the learners’ L1 background. For our experiments, we use texts written by learners of Czech, which have been automatically and manually annotated for errors. We define two classes of learners: speakers of Indo-European languag...

متن کامل

Are Blogs Edited? A Linguistic Survey of Italian Blogs Using Search Engines

Many blogs are written by people with no formal training in public writing; this could suggest a low level of editing and general correctness. A quantitative analysis of misspellings, however, shows that in their orthography Italian blogs are as well revised as conventional Italian newspaper texts. On the other hand, their editing is more careful than the editing of the average of Italian web p...

متن کامل

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

A Causal Classification of Orthography Errors in Web Texts

نویسنده

چکیده

منابع مشابه

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Automatic Identification of Learners' Language Background Based on Their Writing in Czech

Are Blogs Edited? A Linguistic Survey of Italian Blogs Using Search Engines

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

عنوان ژورنال:

اشتراک گذاری